crash frequency
Investigating Robotaxi Crash Severity with Geographical Random Forest and the Urban Environment
Jiao, Junfeng, Baik, Seung Gyu, Choi, Seung Jun, Xu, Yiming
This paper quantitatively investigates the crash severity of Autonomous Vehicles (AVs) with spatially localized machine learning and macroscopic measures of the urban built environment. Extending beyond the microscopic effects of individual infrastructure elements, we focus on the city-scale land use and behavioral patterns, while addressing spatial heterogeneity and spatial autocorrelation. We implemented a spatially localized machine learning technique called Geographical Random Forest (GRF) on the California AV collision dataset. Analyzing multiple urban measures, including points of interest, building footprint, and land use, we built a GRF model and visualized it as a crash severity risk map of San Francisco. This paper presents three findings. First, spatially localized machine learning outperformed regular machine learning in predicting AV crash severity. The bias-variance tradeoff was evident as we adjusted the localization weight hyperparameter. Second, land use was the most important predictor, compared to intersections, building footprints, public transit stops, and Points Of Interest (POIs). Third, AV crashes were more likely to result in low-severity incidents in city center areas with greater diversity and commercial activities, than in residential neighborhoods. Residential land use is likely associated with higher severity due to human behavior and less restrictive environments. Counterintuitively, residential areas were associated with higher crash severity, compared to more complex areas such as commercial and mixed-use areas. When robotaxi operators train their AV systems, it is recommended to: (1) consider where their fleet operates and make localized algorithms for their perception system, and (2) design safety measures specific to residential neighborhoods, such as slower driving speeds and more alert sensors.
Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors
Abdelrahman, Ahmed S., Abdel-Aty, Mohamed, Yang, Samgyu, Faden, Abdulrahman
Understanding the factors contributing to traffic crashes and developing strategies to mitigate their severity is essential. Traditional statistical methods and machine learning models often struggle to capture the complex interactions between various factors and the unique characteristics of each crash. This research leverages large language model (LLM) to analyze freeway crash data and provide crash causation analysis accordingly. By compiling 226 traffic safety studies related to freeway crashes, a training dataset encompassing environmental, driver, traffic, and geometric design factors was created. The Llama3 8B model was fine-tuned using QLoRA to enhance its understanding of freeway crashes and their contributing factors, as covered in these studies. The fine-tuned Llama3 8B model was then used to identify crash causation without pre-labeled data through zero-shot classification, providing comprehensive explanations to ensure that the identified causes were reasonable and aligned with existing research. Results demonstrate that LLMs effectively identify primary crash causes such as alcohol-impaired driving, speeding, aggressive driving, and driver inattention. Incorporating event data, such as road maintenance, offers more profound insights. The model's practical applicability and potential to improve traffic safety measures were validated by a high level of agreement among researchers in the field of traffic safety, as reflected in questionnaire results with 88.89%. This research highlights the complex nature of traffic crashes and how LLMs can be used for comprehensive analysis of crash causation and other contributing factors. Moreover, it provides valuable insights and potential countermeasures to aid planners and policymakers in developing more effective and efficient traffic safety practices.
Enhancing Crash Frequency Modeling Based on Augmented Multi-Type Data by Hybrid VAE-Diffusion-Based Generative Neural Networks
Chen, Junlan, He, Qijie, Liu, Pei, Ma, Wei, Pu, Ziyuan
Crash frequency modelling analyzes the impact of factors like traffic volume, road geometry, and environmental conditions on crash occurrences. Inaccurate predictions can distort our understanding of these factors, leading to misguided policies and wasted resources, which jeopardize traffic safety. A key challenge in crash frequency modelling is the prevalence of excessive zero observations, caused by underreporting, the low probability of crashes, and high data collection costs. These zero observations often reduce model accuracy and introduce bias, complicating safety decision making. While existing approaches, such as statistical methods, data aggregation, and resampling, attempt to address this issue, they either rely on restrictive assumptions or result in significant information loss, distorting crash data. To overcome these limitations, we propose a hybrid VAE-Diffusion neural network, designed to reduce zero observations and handle the complexities of multi-type tabular crash data (count, ordinal, nominal, and real-valued variables). We assess the synthetic data quality generated by this model through metrics like similarity, accuracy, diversity, and structural consistency, and compare its predictive performance against traditional statistical models. Our findings demonstrate that the hybrid VAE-Diffusion model outperforms baseline models across all metrics, offering a more effective approach to augmenting crash data and improving the accuracy of crash frequency predictions. This study highlights the potential of synthetic data to enhance traffic safety by improving crash frequency modelling and informing better policy decisions.
Heterogeneous Ensemble Learning for Enhanced Crash Forecasts -- A Frequentest and Machine Learning based Stacking Framework
Ahmad, Numan, Wali, Behram, Khattak, Asad J.
A variety of statistical and machine learning methods are used to model crash frequency on specific roadways with machine learning methods generally having a higher prediction accuracy. Recently, heterogeneous ensemble methods (HEM), including stacking, have emerged as more accurate and robust intelligent techniques and are often used to solve pattern recognition problems by providing more reliable and accurate predictions. In this study, we apply one of the key HEM methods, Stacking, to model crash frequency on five lane undivided segments (5T) of urban and suburban arterials. The prediction performance of Stacking is compared with parametric statistical models (Poisson and negative binomial) and three state of the art machine learning techniques (Decision tree, random forest, and gradient boosting), each of which is termed as the base learner. By employing an optimal weight scheme to combine individual base learners through stacking, the problem of biased predictions in individual base-learners due to differences in specifications and prediction accuracies is avoided. Data including crash, traffic, and roadway inventory were collected and integrated from 2013 to 2017. The data are split into training, validation, and testing datasets. Estimation results of statistical models reveal that besides other factors, crashes increase with density (number per mile) of different types of driveways. Comparison of out-of-sample predictions of various models confirms the superiority of Stacking over the alternative methods considered. From a practical standpoint, stacking can enhance prediction accuracy (compared to using only one base learner with a particular specification). When applied systemically, stacking can help identify more appropriate countermeasures.
Impact of Narrow Lanes on Arterial Road Vehicle Crashes: A Machine Learning Approach
Elhenawy, Mohammed, Jahangiri, Arash, Rakha, Hesham
In this paper we adopted state-of-the-art machine learning algorithms, namely: random forest (RF) and least squares boosting, to model crash data and identify the optimum model to study the impact of narrow lanes on the safety of arterial roads. Using a ten-year crash dataset in four cities in Nebraska, two machine learning models were assessed based on the prediction error. The RF model was identified as the best model. The RF was used to compute the importance of the lane width predictors in our regression model based on two different measures. Subsequently, the RF model was used to simulate the crash rate for different lane widths. The Kruskal-Wallis test, was then conducted to determine if simulated values from the four lane width groups have equal means. The test null hypothesis of equal means for simulated values from the four lane width groups was rejected. Consequently, it was concluded that the crash rates from at least one lane width group was statistically different from the others. Finally, the results from the pairwise comparisons using the Tukey and Kramer test showed that the changes in crash rates between any two lane width conditions were statistically significant.
A fatal point concept and a low-sensitivity quantitative measure for traffic safety analytics
The variability of the clusters generated by clustering techniques in the domain of latitude and longitude variables of fatal crash data are significantly unpredictable. This unpredictability, caused by the randomness of fatal crash incidents, reduces the accuracy of crash frequency (i.e., counts of fatal crashes per cluster) which is used to measure traffic safety in practice. In this paper, a quantitative measure of traffic safety that is not significantly affected by the aforementioned variability is proposed. It introduces a fatal point -- a segment with the highest frequency of fatality -- concept based on cluster characteristics and detects them by imposing rounding errors to the hundredth decimal place of the longitude. The frequencies of the cluster and the cluster's fatal point are combined to construct a low-sensitive quantitative measure of traffic safety for the cluster. The performance of the proposed measure of traffic safety is then studied by varying the parameter k of k-means clustering with the expectation that other clustering techniques can be adopted in a similar fashion. The 2015 North Carolina fatal crash dataset of Fatality Analysis Reporting System (FARS) is used to evaluate the proposed fatal point concept and perform experimental analysis to determine the effectiveness of the proposed measure. The empirical study shows that the average traffic safety, measured by the proposed quantitative measure over several clusters, is not significantly affected by the variability, compared to that of the standard crash frequency.